Constraint-driven co-clustering of 0/1 data
نویسندگان
چکیده
We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level constraints (Must-link, Cannot-link) to express relationships on both objects and features. Furthermore, we study constraints that exploit sequential orders on objects and/or features. The idea is that we can specify whether the extracted co-clusters should involve or not contiguous elements (Interval and non-Interval constraints). Instead of designing constraint processing integration within a co-clustering scheme, we propose a Local-to-Global (L2G) framework. It consists in postprocessing a collection of (constrained) local patterns that have been computed beforehand (e.g., closed feature sets and their supporting sets of objects) to build a global pattern like a co-clustering. Roughly speaking, the algorithmic scheme is a K-Means-like approach that groups the local patterns. We show that it is possible to push local counterparts of the global constraints on the co-clusters during the local pattern mining phase itself. A large part of the chapter is dedicated to experiments that demonstrate the added-value of our approach. Considering both synthetic data and real gene expression data sets, we discuss the use of constraints to get not only more stable but also more relevant co-clusters. 0-8493-0052-5/00/$0.00+$.50 c © 2007 by CRC Press LLC 11 12 Constrained Clustering: Advances in Algorithms, Theory and Applications
منابع مشابه
Numerical Data Co-clustering via Sum-Squared Residue Minimization and User-defined Constraint Satisfaction
Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. We consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints th...
متن کاملUsing Gaussian Measures for Efficient Constraint Based Clustering
In this paper we present a novel iterative multiphase clustering technique for efficiently clustering high dimensional data points. For this purpose we implement clustering feature (CF) tree on a real data set and a Gaussian density distribution constraint on the resultant CF tree. The post processing by the application of Gaussian density distribution function on the micro-clusters leads to re...
متن کاملTowards Constrained Co-clustering in Ordered 0/1 Data Sets
Within 0/1 data, co-clustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus to improve bi-cluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e...
متن کاملA clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting
This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...
متن کاملEnergy Saving in Kiln Unit of of ABYEK CEMENT CO: Data Clustering Approach
Cost of cement producing all over the world depends on to the level of wages, energy cost and availability of raw materials. By investigating financial statements of various companies at the stock market, the share of electrical and fuel costs are nearly 27 percent of total costs and this plays the important role in right management of energy consumption. In this regard mathematics modeling and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008